Statistical Markovian Data Modeling for Natural Language Processing
نویسنده
چکیده
Markov chain theory is a popular statistical tool in applied probability that is quite useful in modelling real-world computing applications. Over the past years; there has been grown interest to employ Markov chain theory in statistical learning of temporal (i.e. time series) data. A wide range of applications found to utilize Markov concepts; such applications include computational linguists, image processing, communications, bioinformatics, finance systems, etc .In fact, Markov processes based research applied with great success in many of the most efficient natural language processing (NLP) tools. Hence, this paper explores the Markov chain theory and its extension hidden Markov models (HMM) in (NLP) applications. This paper also presents some aspects related to Markov chains and HMM such as creating transition and observation matrices, calculating data sequence probabilities, extracting the hidden states, and profile
منابع مشابه
A Survey on Statistical Approaches to Natural Language Processing
This survey attempts to catch up with the recent increasing interests in statistical approach to natural language processing based on large corpora. First of all, a historical overview traces back to 1950s when Noam Chomsky proposed his phrase structure transformation grammar and rejected the Markov process natural language modeling. With the development of large corpora and language modeling i...
متن کاملAdaptive Natural Language Processing
In the past decades of NLP, there has been a steady shift away from rule-based, linguistically motivated modeling towards statistical learning and the induction of unsupervised feature representations. However, natural language components used in today’s NLP pipelines are still static in the sense that their statistical model or rule-base is created once, then subsequently applied without furth...
متن کاملTrameur: A Framework for Annotated Text Corpora Exploration
Corpus resources with complex linguistic annotations are becoming increasingly important in the work of language specialists. They often need to perform extensive corpus research, including Natural Language Processing (NLP), statistical modelling and data visualisation. Our software system, called Trameur, aims at making these analyses possible within a single graphical user interface. It relie...
متن کاملLanguage Modeling Approaches to Information Retrieval
This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal ...
متن کاملUsing Domain-Specific Knowledge to Classify E-negotiations
Texts exchanged in business-related Computer-Mediated Communication, or CMC, differ from texts exchanged in other business situations. CMC data have a high concentration of non-standard textual features. The fast-growing amount of business CMC data offers opportunities for the application of statistical Natural Language Processing and Machine Learning methods, especially for text-classification...
متن کامل